Search Results: "Matthew Garrett"

21 February 2021

Matthew Garrett: Making hibernation work under Linux Lockdown

Linux draws a distinction between code running in kernel (kernel space) and applications running in userland (user space). This is enforced at the hardware level - in x86-speak[1], kernel space code runs in ring 0 and user space code runs in ring 3[2]. If you're running in ring 3 and you attempt to touch memory that's only accessible in ring 0, the hardware will raise a fault. No matter how privileged your ring 3 code, you don't get to touch ring 0.

Kind of. In theory. Traditionally this wasn't well enforced. At the most basic level, since root can load kernel modules, you could just build a kernel module that performed any kernel modifications you wanted and then have root load it. Technically user space code wasn't modifying kernel space code, but the difference was pretty semantic rather than useful. But it got worse - root could also map memory ranges belonging to PCI devices[3], and if the device could perform DMA you could just ask the device to overwrite bits of the kernel[4]. Or root could modify special CPU registers ("Model Specific Registers", or MSRs) that alter CPU behaviour via the /dev/msr interface, and compromise the kernel boundary that way.

It turns out that there were a number of ways root was effectively equivalent to ring 0, and the boundary was more about reliability (ie, a process running as root that ends up misbehaving should still only be able to crash itself rather than taking down the kernel with it) than security. After all, if you were root you could just replace the on-disk kernel with a backdoored one and reboot. Going deeper, you could replace the bootloader with one that automatically injected backdoors into a legitimate kernel image. We didn't have any way to prevent this sort of thing, so attempting to harden the root/kernel boundary wasn't especially interesting.

In 2012 Microsoft started requiring vendors ship systems with UEFI Secure Boot, a firmware feature that allowed[5] systems to refuse to boot anything without an appropriate signature. This not only enabled the creation of a system that drew a strong boundary between root and kernel, it arguably required one - what's the point of restricting what the firmware will stick in ring 0 if root can just throw more code in there afterwards? What ended up as the Lockdown Linux Security Module provides the tooling for this, blocking userspace interfaces that can be used to modify the kernel and enforcing that any modules have a trusted signature.

But that comes at something of a cost. Most of the features that Lockdown blocks are fairly niche, so the direct impact of having it enabled is small. Except that it also blocks hibernation[6], and it turns out some people were using that. The obvious question is "what does hibernation have to do with keeping root out of kernel space", and the answer is a little convoluted and is tied into how Linux implements hibernation. Basically, Linux saves system state into the swap partition and modifies the header to indicate that there's a hibernation image there instead of swap. On the next boot, the kernel sees the header indicating that it's a hibernation image, copies the contents of the swap partition back into RAM, and then jumps back into the old kernel code. What ensures that the hibernation image was actually written out by the kernel? Absolutely nothing, which means a motivated attacker with root access could turn off swap, write a hibernation image to the swap partition themselves, and then reboot. The kernel would happily resume into the attacker's image, giving the attacker control over what gets copied back into kernel space.

This is annoying, because normally when we think about attacks on swap we mitigate it by requiring an encrypted swap partition. But in this case, our attacker is root, and so already has access to the plaintext version of the swap partition. Disk encryption doesn't save us here. We need some way to verify that the hibernation image was written out by the kernel, not by root. And thankfully we have some tools for that.

Trusted Platform Modules (TPMs) are cryptographic coprocessors[7] capable of doing things like generating encryption keys and then encrypting things with them. You can ask a TPM to encrypt something with a key that's tied to that specific TPM - the OS has no access to the decryption key, and nor does any other TPM. So we can have the kernel generate an encryption key, encrypt part of the hibernation image with it, and then have the TPM encrypt it. We store the encrypted copy of the key in the hibernation image as well. On resume, the kernel reads the encrypted copy of the key, passes it to the TPM, gets the decrypted copy back and is able to verify the hibernation image.

That's great! Except root can do exactly the same thing. This tells us the hibernation image was generated on this machine, but doesn't tell us that it was done by the kernel. We need some way to be able to differentiate between keys that were generated in kernel and ones that were generated in userland. TPMs have the concept of "localities" (effectively privilege levels) that would be perfect for this. Userland is only able to access locality 0, so the kernel could simply use locality 1 to encrypt the key. Unfortunately, despite trying pretty hard, I've been unable to get localities to work. The motherboard chipset on my test machines simply doesn't forward any accesses to the TPM unless they're for locality 0. I needed another approach.

TPMs have a set of Platform Configuration Registers (PCRs), intended for keeping a record of system state. The OS isn't able to modify the PCRs directly. Instead, the OS provides a cryptographic hash of some material to the TPM. The TPM takes the existing PCR value, appends the new hash to that, and then stores the hash of the combination in the PCR - a process called "extension". This means that the new value of the TPM depends not only on the value of the new data, it depends on the previous value of the PCR - and, in turn, that previous value depended on its previous value, and so on. The only way to get to a specific PCR value is to either (a) break the hash algorithm, or (b) perform exactly the same sequence of writes. On system reset the PCRs go back to a known value, and the entire process starts again.

Some PCRs are different. PCR 23, for example, can be reset back to its original value without resetting the system. We can make use of that. The first thing we need to do is to prevent userland from being able to reset or extend PCR 23 itself. All TPM accesses go through the kernel, so this is a simple matter of parsing the write before it's sent to the TPM and returning an error if it's a sensitive command that would touch PCR 23. We now know that any change in PCR 23's state will be restricted to the kernel.

When we encrypt material with the TPM, we can ask it to record the PCR state. This is given back to us as metadata accompanying the encrypted secret. Along with the metadata is an additional signature created by the TPM, which can be used to prove that the metadata is both legitimate and associated with this specific encrypted data. In our case, that means we know what the value of PCR 23 was when we encrypted the key. That means that if we simply extend PCR 23 with a known value in-kernel before encrypting our key, we can look at the value of PCR 23 in the metadata. If it matches, the key was encrypted by the kernel - userland can create its own key, but it has no way to extend PCR 23 to the appropriate value first. We now know that the key was generated by the kernel.

But what if the attacker is able to gain access to the encrypted key? Let's say a kernel bug is hit that prevents hibernation from resuming, and you boot back up without wiping the hibernation image. Root can then read the key from the partition, ask the TPM to decrypt it, and then use that to create a new hibernation image. We probably want to prevent that as well. Fortunately, when you ask the TPM to encrypt something, you can ask that the TPM only decrypt it if the PCRs have specific values. "Sealing" material to the TPM in this way allows you to block decryption if the system isn't in the desired state. So, we define a policy that says that PCR 23 must have the same value at resume as it did on hibernation. On resume, the kernel resets PCR 23, extends it to the same value it did during hibernation, and then attempts to decrypt the key. Afterwards, it resets PCR 23 back to the initial value. Even if an attacker gains access to the encrypted copy of the key, the TPM will refuse to decrypt it.

And that's what this patchset implements. There's one fairly significant flaw at the moment, which is simply that an attacker can just reboot into an older kernel that doesn't implement the PCR 23 blocking and set up state by hand. Fortunately, this can be avoided using another aspect of the boot process. When you boot something via UEFI Secure Boot, the signing key used to verify the booted code is measured into PCR 7 by the system firmware. In the Linux world, the Shim bootloader then measures any additional keys that are used. By either using a new key to tag kernels that have support for the PCR 23 restrictions, or by embedding some additional metadata in the kernel that indicates the presence of this feature and measuring that, we can have a PCR 7 value that verifies that the PCR 23 restrictions are present. We then seal the key to PCR 7 as well as PCR 23, and if an attacker boots into a kernel that doesn't have this feature the PCR 7 value will be different and the TPM will refuse to decrypt the secret.

While there's a whole bunch of complexity here, the process should be entirely transparent to the user. The current implementation requires a TPM 2, and I'm not certain whether TPM 1.2 provides all the features necessary to do this properly - if so, extending it shouldn't be hard, but also all systems shipped in the past few years should have a TPM 2, so that's going to depend on whether there's sufficient interest to justify the work. And we're also at the early days of review, so there's always the risk that I've missed something obvious and there are terrible holes in this. And, well, given that it took almost 8 years to get the Lockdown patchset into mainline, let's not assume that I'm good at landing security code.

[1] Other architectures use different terminology here, such as "supervisor" and "user" mode, but it's broadly equivalent
[2] In theory rings 1 and 2 would allow you to run drivers with privileges somewhere between full kernel access and userland applications, but in reality we just don't talk about them in polite company
[3] This is how graphics worked in Linux before kernel modesetting turned up. XFree86 would just map your GPU's registers into userland and poke them directly. This was not a huge win for stability
[4] IOMMUs can help you here, by restricting the memory PCI devices can DMA to or from. The kernel then gets to allocate ranges for device buffers and configure the IOMMU such that the device can't DMA to anything else. Except that region of memory may still contain sensitive material such as function pointers, and attacks like this can still cause you problems as a result.
[5] This describes why I'm using "allowed" rather than "required" here
[6] Saving the system state to disk and powering down the platform entirely - significantly slower than suspending the system while keeping state in RAM, but also resilient against the system losing power.
[7] With some handwaving around "coprocessor". TPMs can't be part of the OS or the system firmware, but they don't technically need to be an independent component. Intel have a TPM implementation that runs on the Management Engine, a separate processor built into the motherboard chipset. AMD have one that runs on the Platform Security Processor, a small ARM core built into their CPU. Various ARM implementations run a TPM in Trustzone, a special CPU mode that (in theory) is able to access resources that are entirely blocked off from anything running in the OS, kernel or otherwise.

comment count unavailable comments

27 July 2020

Matthew Garrett: Filesystem deduplication is a sidechannel

First off - nothing I'm going to talk about in this post is novel or overly surprising, I just haven't found a clear writeup of it before. I'm not criticising any design decisions or claiming this is an important issue, just raising something that people might otherwise be unaware of.

With that out of the way: Automatic deduplication of data is a feature of modern filesystems like zfs and btrfs. It takes two forms - inline, where the filesystem detects that data being written to disk is identical to data that already exists on disk and simply references the existing copy rather than, and offline, where tooling retroactively identifies duplicated data and removes the duplicate copies (zfs supports inline deduplication, btrfs only currently supports offline). In a world where disks end up with multiple copies of cloud or container images, deduplication can free up significant amounts of disk space.

What's the security implication? The problem is that deduplication doesn't recognise ownership - if two users have copies of the same file, only one copy of the file will be stored[1]. So, if user a stores a file, the amount of free space will decrease. If user b stores another copy of the same file, the amount of free space will remain the same. If user b is able to check how much free space is available, user b can determine whether the file already exists.

This doesn't seem like a huge deal in most cases, but it is a violation of expected behaviour (if user b doesn't have permission to read user a's files, user b shouldn't be able to determine whether user a has a specific file). But we can come up with some convoluted cases where it becomes more relevant, such as law enforcement gaining unprivileged access to a system and then being able to demonstrate that a specific file already exists on that system. Perhaps more interestingly, it's been demonstrated that free space isn't the only sidechannel exposed by deduplication - deduplication has an impact on access timing, and can be used to infer the existence of data across virtual machine boundaries.

As I said, this is almost certainly not something that matters in most real world scenarios. But with so much discussion of CPU sidechannels over the past couple of years, it's interesting to think about what other features also end up leaking information in ways that may not be obvious.

(Edit to add: deduplication isn't enabled on zfs by default and is explicitly triggered on btrfs, so unless it's something you've enabled then this isn't something that affects you)

[1] Deduplication is usually done at the block level rather than the file level, but given zfs's support for variable sized blocks, identical files should be deduplicated even if they're smaller than the maximum record size

comment count unavailable comments

24 June 2020

Matthew Garrett: Making my doorbell work

I recently moved house, and the new building has a Doorbird to act as a doorbell and open the entrance gate for people. There's a documented local control API (no cloud dependency!) and a Home Assistant integration, so this seemed pretty straightforward.

Unfortunately not. The Doorbird is on separate network that's shared across the building, provided by Monkeybrains. We're also a Monkeybrains customer, so our network connection is plugged into the same router and antenna as the Doorbird one. And, as is common, there's port isolation between the networks in order to avoid leakage of information between customers. Rather perversely, we are the only people with an internet connection who are unable to ping my doorbell.

I spent most of the past few weeks digging myself out from under a pile of boxes, but we'd finally reached the point where spending some time figuring out a solution to this seemed reasonable. I spent a while playing with port forwarding, but that wasn't ideal - the only server I run is in the UK, and having packets round trip almost 11,000 miles so I could speak to something a few metres away seemed like a bad plan. Then I tried tethering an old Android device with a data-only SIM, which worked fine but only in one direction (I could see what the doorbell could see, but I couldn't get notifications that someone had pushed a button, which was kind of the point here).

So I went with the obvious solution - I added a wifi access point to the doorbell network, and my home automation machine now exists on two networks simultaneously (nmcli device modify wlan0 ipv4.never-default true is the magic for "ignore the gateway that the DHCP server gives you" if you want to avoid this), and I could now do link local service discovery to find the doorbell if it changed addresses after a power cut or anything. And then, like magic, everything worked - I got notifications from the doorbell when someone hit our button.

But knowing that an event occurred without actually doing something in response seems fairly unhelpful. I have a bunch of Chromecast targets around the house (a mixture of Google Home devices and Chromecast Audios), so just pushing a message to them seemed like the easiest approach. Home Assistant has a text to speech integration that can call out to various services to turn some text into a sample, and then push that to a media player on the local network. You can group multiple Chromecast audio sinks into a group that then presents as a separate device on the network, so I could then write an automation to push audio to the speaker group in response to the button being pressed.

That's nice, but it'd also be nice to do something in response. The Doorbird exposes API control of the gate latch, and Home Assistant exposes that as a switch. I'm using Home Assistant's Google Assistant integration to expose devices Home Assistant knows about to voice control. Which means when I get a house-wide notification that someone's at the door I can just ask Google to open the door for them.

So. Someone pushes the doorbell. That sends a signal to a machine that's bridged onto that network via an access point. That machine then sends a protobuf command to speakers on a separate network, asking them to stream a sample it's providing. Those speakers call back to that machine, grab the sample and play it. At this point, multiple speakers in the house say "Someone is at the door". I then say "Hey Google, activate the front gate" - the device I'm closest to picks this up and sends it to Google, where something turns my speech back into text. It then looks at my home structure data and realises that the "Front Gate" device is associated with my Home Assistant integration. It then calls out to the home automation machine that received the notification in the first place, asking it to trigger the front gate relay. That device calls out to the Doorbird and asks it to open the gate. And now I have functionality equivalent to a doorbell that completes a circuit and rings a bell inside my home, and a button inside my home that completes a circuit and opens the gate, except it involves two networks inside my building, callouts to the cloud, at least 7 devices inside my home that are running Linux and I really don't want to know how many computational cycles.

The future is wonderful.

(I work for Google. I do not work on any of the products described in this post. Please god do not ask me how to integrate your IoT into any of this)

comment count unavailable comments

21 April 2020

Matthew Garrett: Linux kernel lockdown, integrity, and confidentiality

The Linux kernel lockdown patches were merged into the 5.4 kernel last year, which means they're now part of multiple distributions. For me this was a 7-year journey, which means it's easy to forget that others aren't as invested in the code as I am. Here's what these patches are intended to achieve, why they're implemented in the current form and what people should take into account when deploying the feature.

Root is a user - a privileged user, but nevertheless a user. Root is not identical to the kernel. Processes running as root still can't dereference addresses that belong to the kernel, are still subject to the whims of the scheduler and so on. But historically that boundary has been very porous. Various interfaces make it straightforward for root to modify kernel code (such as loading modules or using /dev/mem), while others make it less straightforward (being able to load new ACPI tables that can cause the ACPI interpreter to overwrite the kernel, for instance). In the past that wasn't seen as a significant issue, since there were no widely deployed mechanisms for verifying the integrity of the kernel in the first place. But once UEFI secure boot became widely deployed, this was a problem. If you verify your boot chain but allow root to modify that kernel, the benefits of the verified boot chain are significantly reduced. Even if root can't modify the on-disk kernel, root can just hot-patch the kernel and then make this persistent by dropping a binary that repeats the process on system boot.

Lockdown is intended as a mechanism to avoid that, by providing an optional policy that closes off interfaces that allow root to modify the kernel. This was the sole purpose of the original implementation, which maps to the "integrity" mode that's present in the current implementation. Kernels that boot in lockdown integrity mode prevent even root from using these interfaces, increasing assurances that the running kernel corresponds to the booted kernel. But lockdown's functionality has been extended since then. There are some use cases where preventing root from being able to modify the kernel isn't enough - the kernel may hold secret information that even root shouldn't be permitted to see (such as the EVM signing key that can be used to prevent offline file modification), and the integrity mode doesn't prevent that. This is where lockdown's confidentiality mode comes in. Confidentiality mode is a superset of integrity mode, with additional restrictions on root's ability to use features that would allow the inspection of any kernel memory that could contain secrets.

Unfortunately right now we don't have strong mechanisms for marking which bits of kernel memory contain secrets, so in order to achieve that we end up blocking access to all kernel memory. Unsurprisingly, this compromises people's ability to inspect the kernel for entirely legitimate reasons, such as using the various mechanisms that allow tracing and probing of the kernel.

How can we solve this? There's a few ways:
  1. Introduce a mechanism to tag memory containing secrets, and only restrict accesses to this. I've tried to do something similar for userland and it turns out to be hard, but this is probably the best long-term solution.
  2. Add support for privileged applications with an appropriate signature that implement policy on the userland side. This is actually possible already, though not straightforward. Lockdown is implemented in the LSM layer, which means the policy can be imposed using any other existing LSM. As an example, we could use SELinux to impose the confidentiality restrictions on most processes but permit processes with a specific SELinux context to use them, and then use EVM to ensure that any process running in that context has a legitimate signature. This is quite a few hoops for a general purpose distribution to jump through.
  3. Don't use confidentiality mode in general purpose distributions. The attacks it protects against are mostly against special-purpose use cases, and they can enable it themselves.

My recommendation is for (3), and I'd encourage general purpose distributions that enable lockdown to do so only in integrity mode rather than confidentiality mode. The cost of confidentiality mode is just too high compared to the benefits it provides. People who need confidentiality mode probably already know that they do, and should be in a position to enable it themselves and handle the consequences.

comment count unavailable comments

13 April 2020

Matthew Garrett: Implementing support for advanced DPTF policy in Linux

Intel's Dynamic Platform and Thermal Framework (DPTF) is a feature that's becoming increasingly common on highly portable Intel-based devices. The adaptive policy it implements is based around the idea that thermal management of a system is becoming increasingly complicated - the appropriate set of cooling constraints to place on a system may differ based on a whole bunch of criteria (eg, if a tablet is being held vertically rather than lying on a table, it's probably going to be able to dissipate heat more effectively, so you should impose different constraints). One way of providing these criteria to the OS is to embed them in the system firmware, allowing an OS-level agent to read that and then incorporate OS-level knowledge into a final policy decision.

Unfortunately, while Intel have released some amount of support for DPTF on Linux, they haven't included support for the adaptive policy. And even more annoyingly, many modern laptops run in a heavily conservative thermal state if the OS doesn't support the adaptive policy, meaning that the CPU throttles down extremely quickly and the laptop runs excessively slowly.

It's been a while since I really got stuck into a laptop reverse engineering project, and I don't have much else to do right now, so I've been working on this. It's been a combination of examining what source Intel have released, reverse engineering the Windows code and staring hard at hex dumps until they made some sort of sense. Here's where I am.

There's two main components to the adaptive policy - the adaptive conditions table (APCT) and the adaptive actions table (APAT). The adaptive conditions table contains a set of condition sets, with up to 10 conditions in each condition set. A condition is something like "is the battery above a certain charge", "is this temperature sensor below a certain value", "is the lid open or closed", "is the machine upright or horizontal" and so on. Each condition set is evaluated in turn - if all the conditions evaluate to true, the condition set's target is implemented. If not, we move onto the next condition set. There will typically be a fallback condition set to catch the case where none of the other condition sets evaluate to true.

The action table contains sets of actions associated with a specific target. Once we've picked a target by evaluating the conditions, we execute the actions that have a corresponding target. Actions are things like "Set the CPU power limit to this value" or "Load a passive policy table". Passive policy tables are simply tables associating sensors with devices and an associated temperature limit. If the limit is exceeded, the associated device should be asked to reduce its heat output until the situation is resolved.

There's a couple of twists. The first is the OEM conditions. These are conditions that refer to values that are exposed by the firmware and are otherwise entirely opaque - the firmware knows what these mean, but we don't, so conditions that rely on these values are magical. They could be temperature, they could be power consumption, they could be SKU variations. We just don't know. The other is that older versions of the APCT table didn't include a reference to a device - ie, if you specified a condition based on a temperature, you had no way to express which temperature sensor to use. So, instead, you specified a condition that's greater than 0x10000, which tells the agent to look at the APPC table to extract the device and the appropriate actual condition.

Intel already have a Linux app called Thermal Daemon that implements a subset of this - you're supposed to run the binary-only dptfxtract against your firmware to parse a few bits of the DPTF tables, and it writes out an XML file that Thermal Daemon makes use of. Unfortunately it doesn't handle most of the more interesting bits of the adaptive performance policy, so I've spent the past couple of days extending it to do so and to remove the proprietary dependency.

My current work is here - it requires a couple of kernel patches (that are in the patches directory), and it only supports a very small subset of the possible conditions. It's also entirely possible that it'll do something inappropriate and cause your computer to melt - none of this is publicly documented, I don't have access to the spec and you're relying on my best guesses in a lot of places. But it seems to behave roughly as expected on the one test machine I have here, so time to get some wider testing?

comment count unavailable comments

6 March 2020

Reproducible Builds: Reproducible Builds in February 2020

Welcome to the February 2020 report from the Reproducible Builds project. One of the original promises of open source software is that distributed peer review and transparency of process results in enhanced end-user security. However, whilst anyone may inspect the source code of free and open source software for malicious flaws, almost all software today is distributed as pre-compiled binaries. This allows nefarious third-parties to compromise systems by injecting malicious code into ostensibly secure software during the various compilation and distribution processes. The motivation behind the reproducible builds effort is to provide the ability to demonstrate these binaries originated from a particular, trusted, source release: if identical results are generated from a given source in all circumstances, reproducible builds provides the means for multiple third-parties to reach a consensus on whether a build was compromised via distributed checksum validation or some other scheme. In this month s report, we cover:

If you are interested in contributing to the project, please visit our Contribute page on our website.

Media coverage & upstream news Omar Navarro Leija, a PhD student at the University Of Pennsylvania, published a paper entitled Reproducible Containers that describes in detail the workings of a new user-space container tool called DetTrace:
All computation that occurs inside a DetTrace container is a pure function of the initial filesystem state of the container. Reproducible containers can be used for a variety of purposes, including replication for fault-tolerance, reproducible software builds and reproducible data analytics. We use DetTrace to achieve, in an automatic fashion, reproducibility for 12,130 Debian package builds, containing over 800 million lines of code, as well as bioinformatics and machine learning workflows.
There was also considerable discussion on our mailing list regarding this research and a presentation based on the paper will occur at the ASPLOS 2020 conference between March 16th 20th in Lausanne, Switzerland. The many virtues of Reproducible Builds were touted as benefits for software compliance in a talk at FOSDEM 2020, debating whether the Careful Inventory of Licensing Bill of Materials Have Impact of FOSS License Compliance which pitted Jeff McAffer and Carol Smith against Bradley Kuhn and Max Sills. (~47 minutes in). Nobuyoshi Nakada updated the canonical implementation of the Ruby programming language a change such that filesystem globs (ie. calls to list the contents of filesystem directories) will henceforth be sorted in ascending order. Without this change, the underlying nondeterministic ordering of the filesystem is exposed to the language which often results in an unreproducible build. Vagrant Cascadian reported on our mailing list regarding a quick reproducible test for the GNU Guix distribution, which resulted in 81.9% of packages registering as reproducible in his installation:
$ guix challenge --verbose --diff=diffoscope ...
2,463 store items were analyzed:
  - 2,016 (81.9%) were identical
  - 37 (1.5%) differed
  - 410 (16.6%) were inconclusive
Jeremiah Orians announced on our mailing list the release of a number of tools related to cross-compilation such as M2-Planet and mescc-tools-seed. This project attemps a full bootstrap of a cross-platform compiler for the C programming language (written in C itself) from hex, the ultimate goal being able to demonstrate fully-bootstrapped compiler from hex to the GCC GNU Compiler Collection. This has many implications in and around Ken Thompson s Trusting Trust attack outlined in Thompson s 1983 Turing Award Lecture. Twitter user @TheYoctoJester posted an executive summary of reproducible builds in the Yocto Project: Finally, Reddit user tofflos posted to the /r/Java subreddit asking about how to achieve reproducible builds with Maven and Chris Lamb noticed that the Linux kernel documentation about reproducible builds of it is available on the kernel.org homepages in an attractive HTML format.

Distribution work

Debian Chris Lamb created a merge request for the core debian-installer package to allow all arguments and options from sources.list files (such as [check-valid-until=no] , etc.) in order that we can test the reproducibility of the installer images on the Reproducible Builds own testing infrastructure. (#13) Thorsten Glaser followed-up to a bug filed against the dpkg-source component that was originally filed in late 2015 that claims that the build tool does not respect permissions when unpacking tarballs if the umask is set to 0002. Matthew Garrett posted to the debian-devel mailing list on the topic of Producing verifiable initramfs images as part of a wider conversation on being able to trust the entire software stack on our computers. 59 reviews of Debian packages were added, 30 were updated and 42 were removed this month adding to our knowledge about identified issues. Many issue types were noticed and categorised by Chris Lamb, including:

openSUSE In openSUSE, Bernhard M. Wiedemann published his monthly Reproducible Builds status update as well as provided the following patches:

Software development

diffoscope diffoscope is our in-depth and content-aware diff-like utility that can locate and diagnose reproducibility issues. It is run countless times a day on our testing infrastructure and is essential for identifying fixes and causes of nondeterministic behaviour. Chris Lamb made the following changes this month, including uploading version 137 to Debian:
  • The sng image utility appears to return with an exit code of 1 if there are even minor errors in the file. (#950806)
  • Also extract classes2.dex, classes3.dex from .apk files extracted by apktool. (#88)
  • No need to use str.format if we are just returning the string. [ ]
  • Add generalised support for ignoring returncodes [ ] and move special-casing of returncodes in zip to use Command.VALID_RETURNCODES. [ ]

Other tools disorderfs is our FUSE-based filesystem that deliberately introduces non-determinism into directory system calls in order to flush out reproducibility issues. This month, Vagrant Cascadian updated the Vcs-Git to specify the debian packaging branch. [ ] reprotest is our end-user tool to build same source code twice in widely differing environments and then checks the binaries produced by each build for any differences. This month, versions 0.7.13 and 0.7.14 were uploaded to Debian unstable by Holger Levsen after Vagrant Cascadian added support for GNU Guix [ ].

Project documentation & website There was more work performed on our documentation and website this month. Bernhard M. Wiedemann added a Java Gradle Build Tool snippet to the SOURCE_DATE_EPOCH documentation [ ] and normalised various terms to unreproducible [ ]. Chris Lamb added a Meson.build example [ ] and improved the documentation for the CMake [ ] to the SOURCE_DATE_EPOCH documentation, replaced anyone can with anyone may as, well, not everyone has the resources, skills, time or funding to actually do what it refers to [ ] and improved the pre-processing for our report generation [ ][ ][ ][ ] etc. In addition, Holger Levsen updated our news page to improve the list of reports [ ], added an explicit mention of the weekly news time span [ ] and reverted sorting of news entries to have latest on top [ ] and Mattia Rizzolo added Codethink as a non-fiscal sponsor [ ] and lastly Tianon Gravi added a Docker Images link underneath the Debian project on our Projects page [ ].

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including: Vagrant Cascadian submitted patches via the Debian bug tracking system targeting the packages the Civil Infrastructure Platform has identified via the CIP and CIP build depends package sets:

Testing framework We operate a fully-featured and comprehensive Jenkins-based testing framework that powers tests.reproducible-builds.org. This month, the following changes were made by Holger Levsen: In addition, Mattia Rizzolo added an Apache web server redirect for buildinfos.debian.net [ ] and reverted the reshuffling of arm64 architecture builders [ ]. The usual build node maintenance was performed by Holger Levsen, Mattia Rizzolo [ ][ ] and Vagrant Cascadian.

Getting in touch If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

This month s report was written by Bernhard M. Wiedemann, Chris Lamb and Holger Levsen. It was subsequently reviewed by a bunch of Reproducible Builds folks on IRC and the mailing list.

17 August 2017

Reproducible builds folks: Reproducible Builds: Weekly report #120

Here's what happened in the Reproducible Builds effort between Sunday 6th and Saturday 12th August 2017: Notes about reviews of unreproducible packages 13 package reviews have been added, 7 have been updated and 34 have been removed in this week, adding to our knowledge about identified issues. Packages reviewed and fixed, and reproducibility related bugs filed Upstream packages: Other bugs filed diffoscope development trydiffoscope development tests.reproducible-builds.org Misc. This week's edition was written by Chris Lamb & Holger Levsen & reviewed by a bunch of Reproducible Builds folks on IRC & the mailing lists.

12 August 2017

Bits from Debian: DebConf17 closes in Montreal and DebConf18 dates announced

DebConf17 group photo - click to enlarge Today, Saturday 12 August 2017, the annual Debian Developers and Contributors Conference came to a close. With over 405 people attending from all over the world, and 169 events including 89 talks, 61 discussion sessions or BoFs, 6 workshops and 13 other activities, DebConf17 has been hailed as a success. Highlights included DebCamp with 117 participants, the Open Day,
where events of interest to a broader audience were offered, talks from invited speakers (Deb Nicholson, Matthew Garrett and Katheryn Sutter), the traditional Bits from the DPL, lightning talks and live demos and the announcement of next year's DebConf (DebConf18 in Hsinchu, Taiwan). The schedule has been updated every day, including 32 ad-hoc new activities, planned
by attendees during the whole conference. For those not able to attend, talks and sessions were recorded and live streamed, and videos are being made available at the Debian meetings archive website. Many sessions also facilitated remote participation via IRC or a collaborative pad. The DebConf17 website will remain active for archive purposes, and will continue to offer links to the presentations and videos of talks and events. Next year, DebConf18 will be held in Hsinchu, Taiwan, from 29 July 2018 until 5 August 2018. It will be the first DebConf held in Asia. For the days before DebConf the local organisers will again set up DebCamp (21 July - 27 July), a session for some intense work on improving the distribution, and organise the Open Day on 28 July 2018, aimed at the general public. DebConf is committed to a safe and welcome environment for all participants. See the DebConf Code of Conduct and the Debian Code of Conduct for more details on this. Debian thanks the commitment of numerous sponsors to support DebConf17, particularly our Platinum Sponsors Savoir-Faire Linux, Hewlett Packard Enterprise, and Google. About Savoir-faire Linux Savoir-faire Linux is a Montreal-based Free/Open-Source Software company with offices in Quebec City, Toronto, Paris and Lyon. It offers Linux and Free Software integration solutions in order to provide performance, flexibility and independence for its clients. The company actively contributes to many free software projects, and provides mirrors of Debian, Ubuntu, Linux and others. About Hewlett Packard Enterprise Hewlett Packard Enterprise (HPE) is one of the largest computer companies in the world, providing a wide range of products and services, such as servers, storage, networking, consulting and support, software, and financial services. HPE is also a development partner of Debian, and provides hardware for port development, Debian mirrors, and other Debian services. About Google Google is one of the largest technology companies in the world, providing a wide range of Internet-related services and products as online advertising technologies, search, cloud computing, software, and hardware. Google has been supporting Debian by sponsoring DebConf since more than ten years, at gold level since DebConf12, and at platinum level for this DebConf17.

Sam Hartman: Debian: a Commons of Innovation

I recently returned from Debconf. This year at Debconf, Matthew Garrett gave a talk about the next twenty years in free software. In his talk he raised concerns that Debian might not be relevant in that ecosystem and talked about some of the trends that contribute to his concerns.
I was talking to Marga after the talk and she said that Debian used to be a lot more innovative than it is today.
My initial reaction was doubt; what she said didn't feel right to me. At the time I didn't have a good answer. Since then I've been pondering the issue, and I think I have a partial answer to both Marga and Matthew and so I'll share it here.
In the beginning Debian focused on a lot of technical innovations related to bringing an operating system together. We didn't understand how to approach builds and build dependencies in a uniform manner. Producing packages in a clean environment was hard. We didn't understand what we wanted out of packages in terms of a uniform approach to configuration handling and upgrades. To a large extent we've solved those problems.
However, as the community has grown, our interests are more diverse. Our users and free software (and the operating system we build together) are what bring us together: we still have a central focus. However, no one technical project captures us all. There's still significant technical innovation in the Debian ecosystem. That innovation happens in Debian teams, companies and organizations that interact with the Debian community. We saw several talks about such innovation this year. I found the talk about ostree and flatpak interesting, especially because it focused on people in the broader Debian ecosystem valuing Debian along with some of the same technologies that Matthew is worried will undermine our relevance.
Matthew talked about how Debian ends up being a man-in-the-middle. We're between users and developers. we're between distributions and upstreams. Users are frustrated because we hold back the latest version of software they want from getting to them. Developers are frustrated because we present our users with old versions of their software configured not as they would like, combined with different dependencies than they expect.
All these weaknesses are real.
However, I think Debian-in-the-middle is our greatest strength both on the technical and social front.
I value Debian because I get a relatively uniform interface to the software I use. I can take one approach to reporting bugs whether they are upstream or Debian specific. I expect the software to behave in uniform ways with regard to things covered by policy. I know that I'm not going to have to configure multiple different versions of core dependencies; for the most part system services are shared. When Debian has value it's because our users want those things we provide. Debian has also reviewed every source file in the software we ship to understand the license and license compatibility. As a free software supporter and as someone who consumes software in commercial context, that value alone is enormous.
The world has evolved and we're facing technologies that provide different models. They've been coming for years: Python, Ruby, Java, Perl and others have been putting together their own commons of software. They have all been working to provide virtualization to isolate one program (and its dependencies) from another. Containerization takes that to the next level. Sometimes that's what our users want.
We haven't figured out what the balances are, how we fit into this new world. However, I disagree with the claim that we aren't even discussing the problem. I think we're trying a lot of things off in our own little technical groups. We're just getting to the level of critical mass of understanding where we can take advantage of Debian's modern form of innovation.
Because here's the thing. Debian's innovation now is social, not technical. Just as we're in the middle technically, we're in the middle socially. Upstreams, developers, users, derivatives, and all the other members of our community work together. we're a place where we can share technology, explore solutions, and pull apart common elements. This is the first Debconf where it felt like we'd explored some of these trends enough to start understanding how they might fit together in a whole. Seven years ago, it felt like we were busy being convinced the Java folks were wrong-headed. A couple of years later, it felt like we were starting to understand our users' desires that let to models different than packaging, but we didn't have any thoughts. At least in my part of the hallway it sounded like people were starting to think about how they might fit parts together and what the tradeoffs would be.
Yes, Matthew's talk doubtless sparked some of that. I think he gave us a well deserved and important wake-up call. However, I was excited by the discussion prior to Thursday.
What I'm taking a way is that Debian is valuable when there's a role in the middle. Both socially and technically we should capitalize on situations where something between makes things better and get out of the way where it does not.

18 July 2017

Matthew Garrett: Avoiding TPM PCR fragility using Secure Boot

In measured boot, each component of the boot process is "measured" (ie, hashed and that hash recorded) in a register in the Trusted Platform Module (TPM) build into the system. The TPM has several different registers (Platform Configuration Registers, or PCRs) which are typically used for different purposes - for instance, PCR0 contains measurements of various system firmware components, PCR2 contains any option ROMs, PCR4 contains information about the partition table and the bootloader. The allocation of these is defined by the PC Client working group of the Trusted Computing Group. However, once the boot loader takes over, we're outside the spec[1].

One important thing to note here is that the TPM doesn't actually have any ability to directly interfere with the boot process. If you try to boot modified code on a system, the TPM will contain different measurements but boot will still succeed. What the TPM can do is refuse to hand over secrets unless the measurements are correct. This allows for configurations where your disk encryption key can be stored in the TPM and then handed over automatically if the measurements are unaltered. If anybody interferes with your boot process then the measurements will be different, the TPM will refuse to hand over the key, your disk will remain encrypted and whoever's trying to compromise your machine will be sad.

The problem here is that a lot of things can affect the measurements. Upgrading your bootloader or kernel will do so. At that point if you reboot your disk fails to unlock and you become unhappy. To get around this your update system needs to notice that a new component is about to be installed, generate the new expected hashes and re-seal the secret to the TPM using the new hashes. If there are several different points in the update where this can happen, this can quite easily go wrong. And if it goes wrong, you're back to being unhappy.

Is there a way to improve this? Surprisingly, the answer is "yes" and the people to thank are Microsoft. Appendix A of a basically entirely unrelated spec defines a mechanism for storing the UEFI Secure Boot policy and used keys in PCR 7 of the TPM. The idea here is that you trust your OS vendor (since otherwise they could just backdoor your system anyway), so anything signed by your OS vendor is acceptable. If someone tries to boot something signed by a different vendor then PCR 7 will be different. If someone disables secure boot, PCR 7 will be different. If you upgrade your bootloader or kernel, PCR 7 will be the same. This simplifies things significantly.

I've put together a (not well-tested) patchset for Shim that adds support for including Shim's measurements in PCR 7. In conjunction with appropriate firmware, it should then be straightforward to seal secrets to PCR 7 and not worry about things breaking over system updates. This makes tying things like disk encryption keys to the TPM much more reasonable.

However, there's still one pretty major problem, which is that the initramfs (ie, the component responsible for setting up the disk encryption in the first place) isn't signed and isn't included in PCR 7[2]. An attacker can simply modify it to stash any TPM-backed secrets or mount the encrypted filesystem and then drop to a root prompt. This, uh, reduces the utility of the entire exercise.

The simplest solution to this that I've come up with depends on how Linux implements initramfs files. In its simplest form, an initramfs is just a cpio archive. In its slightly more complicated form, it's a compressed cpio archive. And in its peak form of evolution, it's a series of compressed cpio archives concatenated together. As the kernel reads each one in turn, it extracts it over the previous ones. That means that any files in the final archive will overwrite files of the same name in previous archives.

My proposal is to generate a small initramfs whose sole job is to get secrets from the TPM and stash them in the kernel keyring, and then measure an additional value into PCR 7 in order to ensure that the secrets can't be obtained again. Later disk encryption setup will then be able to set up dm-crypt using the secret already stored within the kernel. This small initramfs will be built into the signed kernel image, and the bootloader will be responsible for appending it to the end of any user-provided initramfs. This means that the TPM will only grant access to the secrets while trustworthy code is running - once the secret is in the kernel it will only be available for in-kernel use, and once PCR 7 has been modified the TPM won't give it to anyone else. A similar approach for some kernel command-line arguments (the kernel, module-init-tools and systemd all interpret the kernel command line left-to-right, with later arguments overriding earlier ones) would make it possible to ensure that certain kernel configuration options (such as the iommu) weren't overridable by an attacker.

There's obviously a few things that have to be done here (standardise how to embed such an initramfs in the kernel image, ensure that luks knows how to use the kernel keyring, teach all relevant bootloaders how to handle these images), but overall this should make it practical to use PCR 7 as a mechanism for supporting TPM-backed disk encryption secrets on Linux without introducing a hug support burden in the process.

[1] The patchset I've posted to add measured boot support to Grub use PCRs 8 and 9 to measure various components during the boot process, but other bootloaders may have different policies.

[2] This is because most Linux systems generate the initramfs locally rather than shipping it pre-built. It may also get rebuilt on various userspace updates, even if the kernel hasn't changed. Including it in PCR 7 would entirely break the fragility guarantees and defeat the point of all of this.

comment count unavailable comments

9 May 2017

Matthew Garrett: Intel AMT on wireless networks

More details about Intel's AMT vulnerablity have been released - it's about the worst case scenario, in that it's a total authentication bypass that appears to exist independent of whether the AMT is being used in Small Business or Enterprise modes (more background in my previous post here). One thing I claimed was that even though this was pretty bad it probably wasn't super bad, since Shodan indicated that there were only a small number of thousand machines on the public internet and accessible via AMT. Most deployments were probably behind corporate firewalls, which meant that it was plausibly a vector for spreading within a company but probably wasn't a likely initial vector.

I've since done some more playing and come to the conclusion that it's rather worse than that. AMT actually supports being accessed over wireless networks. Enabling this is a separate option - if you simply provision AMT it won't be accessible over wireless by default, you need to perform additional configuration (although this is as simple as logging into the web UI and turning on the option). Once enabled, there are two cases:
  1. The system is not running an operating system, or the operating system has not taken control of the wireless hardware. In this case AMT will attempt to join any network that it's been explicitly told about. Note that in default configuration, joining a wireless network from the OS is not sufficient for AMT to know about it - there needs to be explicit synchronisation of the network credentials to AMT. Intel provide a wireless manager that does this, but the stock behaviour in Windows (even after you've installed the AMT support drivers) is not to do this.
  2. The system is running an operating system that has taken control of the wireless hardware. In this state, AMT is no longer able to drive the wireless hardware directly and counts on OS support to pass packets on. Under Linux, Intel's wireless drivers do not appear to implement this feature. Under Windows, they do. This does not require any application level support, and uninstalling LMS will not disable this functionality. This also appears to happen at the driver level, which means it bypasses the Windows firewall.
Case 2 is the scary one. If you have a laptop that supports AMT, and if AMT has been provisioned, and if AMT has had wireless support turned on, and if you're running Windows, then connecting your laptop to a public wireless network means that AMT is accessible to anyone else on that network[1]. If it hasn't received a firmware update, they'll be able to do so without needing any valid credentials.

If you're a corporate IT department, and if you have AMT enabled over wifi, turn it off. Now.

[1] Assuming that the network doesn't block client to client traffic, of course

comment count unavailable comments

1 May 2017

Matthew Garrett: Intel's remote AMT vulnerablity

Intel just announced a vulnerability in their Active Management Technology stack. Here's what we know so far.
Background Intel chipsets for some years have included a Management Engine, a small microprocessor that runs independently of the main CPU and operating system. Various pieces of software run on the ME, ranging from code to handle media DRM to an implementation of a TPM. AMT is another piece of software running on the ME, albeit one that takes advantage of a wide range of ME features.
Active Management TechnologyAMT is intended to provide IT departments with a means to manage client systems. When AMT is enabled, any packets sent to the machine's wired network port on port 16992 or 16993 will be redirected to the ME and passed on to AMT - the OS never sees these packets. AMT provides a web UI that allows you to do things like reboot a machine, provide remote install media or even (if the OS is configured appropriately) get a remote console. Access to AMT requires a password - the implication of this vulnerability is that that password can be bypassed.
Remote managementAMT has two types of remote console: emulated serial and full graphical. The emulated serial console requires only that the operating system run a console on that serial port, while the graphical environment requires drivers on the OS side requires that the OS set a compatible video mode but is also otherwise OS-independent[2]. However, an attacker who enables emulated serial support may be able to use that to configure grub to enable serial console. Remote graphical console seems to be problematic under Linux but some people claim to have it working, so an attacker would be able to interact with your graphical console as if you were physically present. Yes, this is terrifying.
Remote mediaAMT supports providing an ISO remotely. In older versions of AMT (before 11.0) this was in the form of an emulated IDE controller. In 11.0 and later, this takes the form of an emulated USB device. The nice thing about the latter is that any image provided that way will probably be automounted if there's a logged in user, which probably means it's possible to use a malformed filesystem to get arbitrary code execution in the kernel. Fun!

The other part of the remote media is that systems will happily boot off it. An attacker can reboot a system into their own OS and examine drive contents at their leisure. This doesn't let them bypass disk encryption in a straightforward way[1], so you should probably enable that.
How bad is thisThat depends. Unless you've explicitly enabled AMT at any point, you're probably fine. The drivers that allow local users to provision the system would require administrative rights to install, so as long as you don't have them installed then the only local users who can do anything are the ones who are admins anyway. If you do have it enabled, though
How do I know if I have it enabled?Yeah this is way more annoying than it should be. First of all, does your system even support AMT? AMT requires a few things:

1) A supported CPU
2) A supported chipset
3) Supported network hardware
4) The ME firmware to contain the AMT firmware

Merely having a "vPRO" CPU and chipset isn't sufficient - your system vendor also needs to have licensed the AMT code. Under Linux, if lspci doesn't show a communication controller with "MEI" or "HECI" in the description, AMT isn't running and you're safe. If it does show an MEI controller, that still doesn't mean you're vulnerable - AMT may still not be provisioned. If you reboot you should see a brief firmware splash mentioning the ME. Hitting ctrl+p at this point should get you into a menu which should let you disable AMT.
How about over Wifi?Turning on AMT doesn't automatically turn it on for wifi. AMT will also only connect itself to networks it's been explicitly told about. Where things get more confusing is that once the OS is running, responsibility for wifi is switched from the ME to the OS and it forwards packets to AMT. I haven't been able to find good documentation on whether having AMT enabled for wifi results in the OS forwarding packets to AMT on all wifi networks or only ones that are explicitly configured.
What do we not know?We have zero information about the vulnerability, other than that it allows unauthenticated access to AMT. One big thing that's not clear at the moment is whether this affects all AMT setups, setups that are in Small Business Mode, or setups that are in Enterprise Mode. If the latter, the impact on individual end-users will be basically zero - Enterprise Mode involves a bunch of effort to configure and nobody's doing that for their home systems. If it affects all systems, or just systems in Small Business Mode, things are likely to be worse.
We now know that the vulnerability exists in all configurations.
What should I do?Make sure AMT is disabled. If it's your own computer, you should then have nothing else to worry about. If you're a Windows admin with untrusted users, you should also disable or uninstall LMS by following these instructions.
Does this mean every Intel system built since 2008 can be taken over by hackers?No. Most Intel systems don't ship with AMT. Most Intel systems with AMT don't have it turned on.
Does this allow persistent compromise of the system?Not in any novel way. An attacker could disable Secure Boot and install a backdoored bootloader, just as they could with physical access.
But isn't the ME a giant backdoor with arbitrary access to RAM?Yes, but there's no indication that this vulnerability allows execution of arbitrary code on the ME - it looks like it's just (ha ha) an authentication bypass for AMT.
Is this a big deal anyway?Yes. Fixing this requires a system firmware update in order to provide new ME firmware (including an updated copy of the AMT code). Many of the affected machines are no longer receiving firmware updates from their manufacturers, and so will probably never get a fix. Anyone who ever enables AMT on one of these devices will be vulnerable. That's ignoring the fact that firmware updates are rarely flagged as security critical (they don't generally come via Windows update), so even when updates are made available, users probably won't know about them or install them.
Avoiding this kind of thing in futureUsers ought to have full control over what's running on their systems, including the ME. If a vendor is no longer providing updates then it should at least be possible for a sufficiently desperate user to pay someone else to do a firmware build with the appropriate fixes. Leaving firmware updates at the whims of hardware manufacturers who will only support systems for a fraction of their useful lifespan is inevitably going to end badly.
How certain are you about any of this?Not hugely - the quality of public documentation on AMT isn't wonderful, and while I've spent some time playing with it (and related technologies) I'm not an expert. If anything above seems inaccurate, let me know and I'll fix it.

[1] Eh well. They could reboot into their own OS, modify your initramfs (because that's not signed even if you're using UEFI Secure Boot) such that it writes a copy of your disk passphrase to /boot before unlocking it, wait for you to type in your passphrase, reboot again and gain access. Sealing the encryption key to the TPM would avoid this.

[2] Updated after this comment - I thought I'd fixed this before publishing but left that claim in by accident.

(Updated to add the section on wifi)

(Updated to typo replace LSM with LMS)

(Updated to indicate that the vulnerability affects all configurations)

comment count unavailable comments

30 April 2017

Matthew Garrett: Looking at the Netgear Arlo home IP camera

Another in the series of looking at the security of IoT type objects. This time I've gone for the Arlo network connected cameras produced by Netgear, specifically the stock Arlo base system with a single camera. The base station is based on a Broadcom 5358 SoC with an 802.11n radio, along with a single Broadcom gigabit ethernet interface. Other than it only having a single ethernet port, this looks pretty much like a standard Netgear router. There's a convenient unpopulated header on the board that turns out to be a serial console, so getting a shell is only a few minutes work.

Normal setup is straight forward. You plug the base station into a router, wait for all the lights to come on and then you visit arlo.netgear.com and follow the setup instructions - by this point the base station has connected to Netgear's cloud service and you're just associating it to your account. Security here is straightforward: you need to be coming from the same IP address as the Arlo. For most home users with NAT this works fine. I sat frustrated as it repeatedly failed to find any devices, before finally moving everything behind a backup router (my main network isn't NATted) for initial setup. Once you and the Arlo are on the same IP address, the site shows you the base station's serial number for confirmation and then you attach it to your account. Next step is adding cameras. Each base station is broadcasting an 802.11 network on the 2.4GHz spectrum. You connect a camera by pressing the sync button on the base station and then the sync button on the camera. The camera associates with the base station via WPS and now you're up and running.

This is the point where I get bored and stop following instructions, but if you're using a desktop browser (rather than using the mobile app) you appear to need Flash in order to actually see any of the camera footage. Bleah.

But back to the device itself. The first thing I traced was the initial device association. What I found was that once the device is associated with an account, it can't be attached to another account. This is good - I can't simply request that devices be rebound to my account from someone else's. Further, while the serial number is displayed to the user to disambiguate between devices, it doesn't seem to be what's used internally. Tracing the logon traffic from the base station shows it sending a long random device ID along with an authentication token. If you perform a factory reset, these values are regenerated. The device to account mapping seems to be based on this random device ID, which means that once the device is reset and bound to another account there's no way for the initial account owner to regain access (other than resetting it again and binding it back to their account). This is far better than many devices I've looked at.

Performing a factory reset also changes the WPA PSK for the camera network. Newsky Security discovered that doing so originally reset it to 12345678, which is, uh, suboptimal? That's been fixed in newer firmware, along with their discovery that the original random password choice was not terribly random.

All communication from the base station to the cloud seems to be over SSL, and everything validates certificates properly. This also seems to be true for client communication with the cloud service - camera footage is streamed back over port 443 as well.

Most of the functionality of the base station is provided by two daemons, xagent and vzdaemon. xagent appears to be responsible for registering the device with the cloud service, while vzdaemon handles the camera side of things (including motion detection). All of this is running as root, so in the event of any kind of vulnerability the entire platform is owned. For such a single purpose device this isn't really a big deal (the only sensitive data it has is the camera feed - if someone has access to that then root doesn't really buy them anything else). They're statically linked and stripped so I couldn't be bothered spending any significant amount of time digging into them. In any case, they don't expose any remotely accessible ports and only connect to services with verified SSL certificates. They're probably not a big risk.

Other than the dependence on Flash, there's nothing immediately concerning here. What is a little worrying is a family of daemons running on the device and listening to various high numbered UDP ports. These appear to be provided by Broadcom and a standard part of all their router platforms - they're intended for handling various bits of wireless authentication. It's not clear why they're listening on 0.0.0.0 rather than 127.0.0.1, and it's not obvious whether they're vulnerable (they mostly appear to receive packets from the driver itself, process them and then stick packets back into the kernel so who knows what's actually going on), but since you can't set one of these devices up in the first place without it being behind a NAT gateway it's unlikely to be of real concern to most users. On the other hand, the same daemons seem to be present on several Broadcom-based router platforms where they may end up being visible to the outside world. That's probably investigation for another day, though.

Overall: pretty solid, frustrating to set up if your network doesn't match their expectations, wouldn't have grave concerns over having it on an appropriately firewalled network.

(Edited to replace a mistaken reference to WDS with WPS)

comment count unavailable comments

11 April 2017

Matthew Garrett: Disabling SSL validation in binary apps

Reverse engineering protocols is a great deal easier when they're not encrypted. Thankfully most apps I've dealt with have been doing something convenient like using AES with a key embedded in the app, but others use remote protocols over HTTPS and that makes things much less straightforward. MITMProxy will solve this, as long as you're able to get the app to trust its certificate, but if there's a built-in pinned certificate that's going to be a pain. So, given an app written in C running on an embedded device, and without an easy way to inject new certificates into that device, what do you do?

First: The app is probably using libcurl, because it's free, works and is under a license that allows you to link it into proprietary apps. This is also bad news, because libcurl defaults to having sensible security settings. In the worst case we've got a statically linked binary with all the symbols stripped out, so we're left with the problem of (a) finding the relevant code and (b) replacing it with modified code. Fortuntely, this is much less difficult than you might imagine.

First, let's find where curl sets up its defaults. Curl_init_userdefined() in curl/lib/url.c has the following code:
set->ssl.primary.verifypeer = TRUE;
set->ssl.primary.verifyhost = TRUE;
#ifdef USE_TLS_SRP
set->ssl.authtype = CURL_TLSAUTH_NONE;
#endif
set->ssh_auth_types = CURLSSH_AUTH_DEFAULT; /* defaults to any auth
type */
set->general_ssl.sessionid = TRUE; /* session ID caching enabled by
default */
set->proxy_ssl = set->ssl;

set->new_file_perms = 0644; /* Default permissions */
set->new_directory_perms = 0755; /* Default permissions */

TRUE is defined as 1, so we want to change the code that currently sets verifypeer and verifyhost to 1 to instead set them to 0. How to find it? Look further down - new_file_perms is set to 0644 and new_directory_perms is set to 0755. The leading 0 indicates octal, so these correspond to decimal 420 and 493. Passing the file to objdump -d (assuming a build of objdump that supports this architecture) will give us a disassembled version of the code, so time to fix our problems with grep:
objdump -d target grep --after=20 ,420 grep ,493

This gives us the disassembly of target, searches for any occurrence of ",420" (indicating that 420 is being used as an argument in an instruction), prints the following 20 lines and then searches for a reference to 493. It spits out a single hit:
43e864: 240301ed li v1,493
Which is promising. Looking at the surrounding code gives:
43e820: 24030001 li v1,1
43e824: a0430138 sb v1,312(v0)
43e828: 8fc20018 lw v0,24(s8)
43e82c: 24030001 li v1,1
43e830: a0430139 sb v1,313(v0)
43e834: 8fc20018 lw v0,24(s8)
43e838: ac400170 sw zero,368(v0)
43e83c: 8fc20018 lw v0,24(s8)
43e840: 2403ffff li v1,-1
43e844: ac4301dc sw v1,476(v0)
43e848: 8fc20018 lw v0,24(s8)
43e84c: 24030001 li v1,1
43e850: a0430164 sb v1,356(v0)
43e854: 8fc20018 lw v0,24(s8)
43e858: 240301a4 li v1,420
43e85c: ac4301e4 sw v1,484(v0)
43e860: 8fc20018 lw v0,24(s8)
43e864: 240301ed li v1,493
43e868: ac4301e8 sw v1,488(v0)

Towards the end we can see 493 being loaded into v1, and v1 then being copied into an offset from v0. This looks like a structure member being set to 493, which is what we expected. Above that we see the same thing being done to 420. Further up we have some more stuff being set, including a -1 - that corresponds to CURLSSH_AUTH_DEFAULT, so we seem to be in the right place. There's a zero above that, which corresponds to CURL_TLSAUTH_NONE. That means that the two 1 operations above the -1 are the code we want, and simply changing 43e820 and 43e82c to 24030000 instead of 24030001 means that our targets will be set to 0 (ie, FALSE) rather than 1 (ie, TRUE). Copy the modified binary back to the device, run it and now it happily talks to MITMProxy. Huge success.

(If the app calls Curl_setopt() to reconfigure the state of these values, you'll need to stub those out as well - thankfully, recent versions of curl include a convenient string "CURLOPT_SSL_VERIFYHOST no longer supports 1 as value!" in this function, so if the code in question is using semi-recent curl it's easy to find. Then it's just a matter of looking for the constants that CURLOPT_SSL_VERIFYHOST and CURLOPT_SSL_VERIFYPEER are set to, following the jumps and hacking the code to always set them to 0 regardless of the argument)

comment count unavailable comments

9 April 2017

Matthew Garrett: A quick look at the Ikea Tr dfri lighting platform

Ikea recently launched their Tr dfri smart lighting platform in the US. The idea of Ikea plus internet security together at last seems like a pretty terrible one, but having taken a look it's surprisingly competent. Hardware-wise, the device is pretty minimal - it seems to be based on the Cypress[1] WICED IoT platform, with 100MBit ethernet and a Silicon Labs Zigbee chipset. It's running the Express Logic ThreadX RTOS, has no running services on any TCP ports and appears to listen on two single UDP ports. As IoT devices go, it's pleasingly minimal.

That single port seems to be a COAP server running with DTLS and a pre-shared key that's printed on the bottom of the device. When you start the app for the first time it prompts you to scan a QR code that's just a machine-readable version of that key. The Android app has code for using the insecure COAP port rather than the encrypted one, but the device doesn't respond to queries there so it's presumably disabled in release builds. It's also local only, with no cloud support. You can program timers, but they run on the device. The only other service it seems to run is an mdns responder, which responds to the _coap._udp.local query to allow for discovery.

From a security perspective, this is pretty close to ideal. Having no remote APIs means that security is limited to what's exposed locally. The local traffic is all encrypted. You can only authenticate with the device if you have physical access to read the (decently long) key off the bottom. I haven't checked whether the DTLS server is actually well-implemented, but it doesn't seem to respond unless you authenticate first which probably covers off a lot of potential risks. The SoC has wireless support, but it seems to be disabled - there's no antenna on board and no mechanism for configuring it.

However, there's one minor issue. On boot the device grabs the current time from pool.ntp.org (fine) but also hits http://fw.ota.homesmart.ikea.net/feed/version_info.json . That file contains a bunch of links to firmware updates, all of which are also downloaded over http (and not https). The firmware images themselves appear to be signed, but downloading untrusted objects and then parsing them isn't ideal. Realistically, this is only a problem if someone already has enough control over your network to mess with your DNS, and being wired-only makes this pretty unlikely. I'd be surprised if it's ever used as a real avenue of attack.

Overall: as far as design goes, this is one of the most secure IoT-style devices I've looked at. I haven't examined the COAP stack in detail to figure out whether it has any exploitable bugs, but the attack surface is pretty much as minimal as it could be while still retaining any functionality at all. I'm impressed.

[1] Formerly Broadcom

comment count unavailable comments

28 March 2017

Reproducible builds folks: Reproducible Builds: week 100 in Stretch cycle

Welcome to the 100th issue of this weekly news! We hope you have been enjoying our posts and would love to receive some feedback via our mailing list! Anyway, here's what happened in the Reproducible Builds effort between Sunday March 19 and Saturday March 25 2017: Reproducible Builds Hackathon Hamburg 2017 The Reproducible Builds Hamburg Hackathon 2017 (or RB-HH-2017 for short) is a 3-day hacking event taking place May 5th-7th in the CCC Hamburg Hackerspace inside Frappant, as collective art space located witin a historical monument in Hamburg, Germany. The event is open to everyone and we still have some free seats. If you wish to attend, please register your interest as soon as possible. Media coverage Packages reviewed and fixed, and bugs filed Chris Lamb: Reviews of unreproducible packages 30 package reviews have been added, 13 have been updated and 2 have been removed in this week, adding to our knowledge about identified issues. Weekly QA work During our reproducibility testing, FTBFS bugs have been detected and reported by: diffoscope development buildinfo.debian.net development Misc. This week's edition was written by Chris Lamb & Holger Levsen & reviewed by a bunch of Reproducible Builds folks on IRC & the mailing lists.

21 March 2017

Matthew Garrett: Announcing the Shim review process

Shim has been hugely successful, to the point of being used by the majority of significant Linux distributions and many other third party products (even, apparently, Solaris). The aim was to ensure that it would remain possible to install free operating systems on UEFI Secure Boot platforms while still allowing machine owners to replace their bootloaders and kernels, and it's achieved this goal.

However, a legitimate criticism has been that there's very little transparency in Microsoft's signing process. Some people have waited for significant periods of time before being receiving a response. A large part of this is simply that demand has been greater than expected, and Microsoft aren't in the best position to review code that they didn't write in the first place.

To that end, we're adopting a new model. A mailing list has been created at shim-review@lists.freedesktop.org, and members of this list will review submissions and provide a recommendation to Microsoft on whether these should be signed or not. The current set of expectations around binaries to be signed documented here and the current process here - it is expected that this will evolve slightly as we get used to the process, and we'll provide a more formal set of documentation once things have settled down.

This is a new initiative and one that will probably take a little while to get working smoothly, but we hope it'll make it much easier to get signed releases of Shim out without compromising security in the process.

comment count unavailable comments

20 March 2017

Matthew Garrett: Buying a Utah teapot

The Utah teapot was one of the early 3D reference objects. It's canonically a Melitta but hasn't been part of their range in a long time, so I'd been watching Ebay in the hope of one turning up. Until last week, when I discovered that a company called Friesland had apparently bought a chunk of Melitta's range some years ago and sell the original teapot[1]. I've just ordered one, and am utterly unreasonably excited about this.

Update: Friesland have apparently always produced the Utah teapot, but were part of the Melitta group for some time - they didn't buy the range from Melitta.

[1] They have them in 0.35, 0.85 and 1.4 litre sizes. I believe (based on the measurements here) that the 1.4 litre one matches the Utah teapot.

comment count unavailable comments

8 March 2017

Matthew Garrett: The Internet of Microphones

So the CIA has tools to snoop on you via your TV and your Echo is testifying in a murder case and yet people are still buying connected devices with microphones in and why are they doing that the world is on fire surely this is terrible?

You're right that the world is terrible, but this isn't really a contributing factor to it. There's a few reasons why. The first is that there's really not any indication that the CIA and MI5 ever turned this into an actual deployable exploit. The development reports[1] describe a project that still didn't know what would happen to their exploit over firmware updates and a "fake off" mode that left a lit LED which wouldn't be there if the TV were actually off, so there's a potential for failed updates and people noticing that there's something wrong. It's certainly possible that development continued and it was turned into a polished and usable exploit, but it really just comes across as a bunch of nerds wanting to show off a neat demo.

But let's say it did get to the stage of being deployable - there's still not a great deal to worry about. No remote infection mechanism is described, so they'd need to do it locally. If someone is in a position to reflash your TV without you noticing, they're also in a position to, uh, just leave an internet connected microphone of their own. So how would they infect you remotely? TVs don't actually consume a huge amount of untrusted content from arbitrary sources[2], so that's much harder than it sounds and probably not worth it because:

YOU ARE CARRYING AN INTERNET CONNECTED MICROPHONE THAT CONSUMES VAST QUANTITIES OF UNTRUSTED CONTENT FROM ARBITRARY SOURCES

Seriously your phone is like eleven billion times easier to infect than your TV is and you carry it everywhere. If the CIA want to spy on you, they'll do it via your phone. If you're paranoid enough to take the battery out of your phone before certain conversations, don't have those conversations in front of a TV with a microphone in it. But, uh, it's actually worse than that.

These days audio hardware usually consists of a very generic codec containing a bunch of digital analogue converters, some analogue digital converters and a bunch of io pins that can basically be wired up in arbitrary ways. Hardcoding the roles of these pins makes board layout more annoying and some people want more inputs than outputs and some people vice versa, so it's not uncommon for it to be possible to reconfigure an input as an output or vice versa. From software.

Anyone who's ever plugged a microphone into a speaker jack probably knows where I'm going with this. An attacker can "turn off" your TV, reconfigure the internal speaker output as an input and listen to you on your "microphoneless" TV. Have a nice day, and stop telling people that putting glue in their laptop microphone is any use unless you're telling them to disconnect the internal speakers as well.

If you're in a situation where you have to worry about an intelligence agency monitoring you, your TV is the least of your concerns - any device with speakers is just as bad. So what about Alexa? The summary here is, again, it's probably easier and more practical to just break your phone - it's probably near you whenever you're using an Echo anyway, and they also get to record you the rest of the time. The Echo platform is very restricted in terms of where it gets data[3], so it'd be incredibly hard to compromise without Amazon's cooperation. Amazon's not going to give their cooperation unless someone turns up with a warrant, and then we're back to you already being screwed enough that you should have got rid of all your electronics way earlier in this process. There are reasons to be worried about always listening devices, but intelligence agencies monitoring you shouldn't generally be one of them.

tl;dr: The CIA probably isn't listening to you through your TV, and if they are then you're almost certainly going to have a bad time anyway.

[1] Which I have obviously not read
[2] I look forward to the first person demonstrating code execution through malformed MPEG over terrestrial broadcast TV
[3] You'd need a vulnerability in its compressed audio codecs, and you'd need to convince the target to install a skill that played content from your servers

comment count unavailable comments

27 February 2017

Matthew Garrett: The Fantasyland Code of Professionalism is an abuser's fantasy

The Fantasyland Institute of Learning is the organisation behind Lambdaconf, a functional programming conference perhaps best known for standing behind a racist they had invited as a speaker. The fallout of that has resulted in them trying to band together events in order to reduce disruption caused by sponsors or speakers declining to be associated with conferences that think inviting racists is more important than the comfort of non-racists, which is weird in all sorts of ways but not what I'm talking about here because they've also written a "Code of Professionalism" which is like a Code of Conduct except it protects abusers rather than minorities and no really it is genuinely as bad as it sounds.

The first thing you need to know is that the document uses its own jargon. Important here are the concepts of active and inactive participation - active participation is anything that you do within the community covered by a specific instance of the Code, inactive participation is anything that happens anywhere ever (ie, active participation is a subset of inactive participation). The restrictions based around active participation are broadly those that you'd expect in a very weak code of conduct - it's basically "Don't be mean", but with some quirks. The most significant is that there's a "Don't moralise" provision, which as written means saying "I think people who support slavery are bad" in a community setting is a violation of the code, but the description of discrimination means saying "I volunteer to mentor anybody from a minority background" could also result in any community member not from a minority background complaining that you've discriminated against them. It's just not very good.

Inactive participation is where things go badly wrong. If you engage in community or professional sabotage, or if you shame a member based on their behaviour inside the community, that's a violation. Community sabotage isn't defined and so basically allows a community to throw out whoever they want to. Professional sabotage means doing anything that can hurt a member's professional career. Shaming is saying anything negative about a member to a non-member if that information was obtained from within the community.

So, what does that mean? Here are some things that you are forbidden from doing:
Now, clearly, some of these are unintentional - I don't think the authors of this policy would want to defend the idea that you can't report something to the police, and I'm sure they'd be willing to modify the document to permit this. But it's indicative of the mindset behind it. This policy has been written to protect people who are accused of doing something bad, not to protect people who have something bad done to them.

There are other examples of this. For instance, violations are not publicised unless the verdict is that they deserve banishment. If a member harasses another member but is merely given a warning, the victim is still not permitted to tell anyone else that this happened. The perpetrator is then free to repeat their behaviour in other communities, and the victim has to choose between either staying silent or warning them and risk being banished from the community for shaming.

If you're an abuser then this is perfect. You're in a position where your victims have to choose between their career (which will be harmed if they're unable to function in the community) and preventing the same thing from happening to others. Many will choose the former, which gives you far more freedom to continue abusing others. Which means that communities adopting the Fantasyland code will be more attractive to abusers, and become disproportionately populated by them.

I don't believe this is the intent, but it's an inevitable consequence of the priorities inherent in this code. No matter how many corner cases are cleaned up, if a code prevents you from saying bad things about people or communities it prevents people from being able to make informed choices about whether that community and its members are people they wish to associate with. When there are greater consequences to saying someone's racist than them being racist, you're fucking up badly.

comment count unavailable comments

Next.

Previous.